Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study

نویسندگان

  • Douglas Burdick
  • Mauricio A. Hernández
  • C. T. Howard Ho
  • Georgia Koutrika
  • Rajasekar Krishnamurthy
  • Lucian Popa
  • Ioana Stanoi
  • Shivakumar Vaithyanathan
  • Sanjiv R. Das
چکیده

We present Midas, a system that uses complex data processing to extract and aggregate facts from a large collection of structured and unstructured documents into a set of unified, clean entities and relationships. Midas focuses on data for financial companies and is based on periodic filings with the U.S. Securities and Exchange Commission (SEC) and Federal Deposit Insurance Corporation (FDIC). We show that, by using data aggregated by Midas, we can provide valuable insights about financial institutions either at the whole system level or at the individual company level. The key technology components that we implemented in Midas and that enable the various financial applications are: information extraction, entity resolution, mapping and fusion, all on top of a scalable infrastructure based on Hadoop. We describe our experience in building the Midas system and also outline the key research questions that remain to be addressed towards building a generic, high-level infrastructure for large-scale data integration from public sources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating the Population Perspective into Health System Performance Assessment (IPHA): Study Protocol for a Cross-Sectional Study in Germany Linking Survey and Claims Data of Statutorily and Privately Insured

Background Health system performance assessment (HSPA) is a major tool for evidence-based governance in health systems and patient/population-orientation is increasingly considered as an important aspect. The IPHA study aims (1) to undertake a comprehensive performance assessment of the German health system from a population perspec...

متن کامل

Designing a new multi-objective fuzzy stochastic DEA model in a dynamic ‎environment to estimate efficiency of decision making units (Case Study: An Iranian Petroleum Company)

This ‎paper presents a new multi-objective fuzzy stochastic data envelopment analysis model          (MOFS-DEA) under mean chance constraints and common weights to estimate the efficiency of decision making units for future financial periods of them. In the initial MOFS-DEA ‏model, the outputs and inputs are ‎characterized by random triangular fuzzy variables with normal distribution, in which ...

متن کامل

Application of Big Data Analytics in Power Distribution Network

Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...

متن کامل

Assessment of Public Hospital Governance in Romania: Lessons From 10 Case Studies

Background The Government of Romania commissioned international technical assistance to help unpacking the causes of arrears in selected public hospitals. Emphases were placed on the governance-related determinants of the hospital performance in the context of the Romanian health system.   Methods The assessment was structured around a public hospital governance framewor...

متن کامل

Efficiency Evaluation by using mixed modeling of Data Envelopment Analysis and Balanced Scorecard- A Case Study in the banking industry

The first objective in any financial organization is to improve performance, and performance evaluation also is one of the best ways to advance operations in organizations. By utilizing different methods of performance evaluation, organizations can evaluate the effectiveness and efficiency of processes that are in accord with strategic objectives. In addition, the performance evaluation instrum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2011